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1 Fast, effective code generation in a just-in-time Java compiler 

Ali-Reza Adl-Tabatabai, Michat Cierniak, Guei-Yuan Lueh, Vishesh M. Parikh, James M. 
Stichnoth 

May 1998 ACM SIGPLAN Notices , Proceedings of the ACM SI G PLAN 1998 conference 
on Programming language design and implementation, volume 33 issue 5 

Additional Information: full citation, abstract , references , citings , index 
terms 



Full text available: |£ |pdf(1.44 MB) 



A "Just-In -Time" (JIT) Java compiler produces native code from Java byte code instructions 
during program execution. As such, compilation speed is more important in a Java JIT 
compiler than in a traditional compiler, requiring optimization algorithms to be lightweight 
and effective. We present the structure of a Java JIT compiler for the Intel Architecture, 
describe the lightweight implementation of JIT compiler optimizations (e.g., common 
subexpression elimination, register allocation, and elim ... 

2 The priority-based coloring approach to register allocation 
Fred C. Chow, John L. Hennessy 

October 1990 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 12 Issue 4 

Additional Information: full citation, abstract, references , citings, index 
terms, review 



Full text available: gpdfi297MB) 



Global register allocation plays a major role in determining the efficacy of an optimizing 
compiler. Graph coloring has been used as the central paradigm for register allocation in 
modern compilers. A straightforward coloring approach can suffer from several 
shortcomings. These shortcomings are addressed in this paper by coloring the graph using a 
priority ordering. A natural method for dealing with the spilling emerges from this approach. 
The detailed algorithms for a priority-based colori ... 



Compilers II: Inter-procedural stacked register allocation for itanium® like architecture 
Liu Yang, Sun Chan, G. R. Gao, Roy Ju, Guei-Yuan Lueh, Zhaoqing Zhang 
June 2003 Proceedings of the 17th annual international conference on 
Superco m puti ng 

Full text available: ^ pdf(478.20 KB) Additional Information: full citation , abstract , references , index terms 

A hardware managed register stack, Register Stack Engine (RSE), is implemented in 
Itanium® architecture to provide a unified and flexible register structure to software. The 
compiler allocates each procedure a register stack frame with its size explicitly specified 
using an alloc instruction. When the total number of registers used by the procedures on the 
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call stack exceeds the number of physical registers, RSE performs automatically register 
overflows and fills to ensure that the c ... 

Keywords: hot region, hotspot, inter-procedural stacked register allocation, quota 
assignment, register allocation 
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Towards a family of languages for the design and implementation of machine 
architectures 

Subrata Dasgupta, Marius Olafsson 

April 1982 Proceedings of the 9th annual symposium on Computer Architecture 

Full text available - Ddf(759 02 KB) Additional Information: full citation, abstract , references, citings , index 
' ^ terms 

In recent years, increases in complexity of hardware/firmware systems, and the concern for 
systems reliability, have resulted in growing interest in methodologies and tools for the 
design, description and verification of computer systems. A vital component of any such 
design methodology is the language used for representing the design. In the case of 
particularly complex systems the design process may involve a succession of stages each of 
which represents the system at a particular level of ... 

Hot cold optimization of large Windows/NT applications 
Robert Cohn, P. Geoffrey Lowney 

December 1996 Proceedings of the 29th annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available: pdfd 14 ME3) I P Additional Information: full citation , abstract , references , citings , index 
Publisher Site 

A dynamic instruction trace often contains many unnecessary instructions that are required 
only by the unexecuted portion of the program. Hot-cold optimization (HCO) is a technique 
that realizes this performance opportunity. HCO uses profile information to partition each 
routine into frequently executed (hot) and infrequently executed (cold) parts. Unnecessary 
operations in the hot portion are removed, and compensation code is added on transitions 
from hot to cold as needed. We evaluate HCO on a ... 

Keywords: optimization, pro file, NT, register allocation 



6 A practical method for code generation based on exhaustive search 
David W. Krumme, David H. Ackley 

June 1982 ACM SIGPLAN Notices , Proceedings of the 1982 SIGPLAN symposium on 

Compiler construction, volume 17 issue 6 
Full text available - ff5 odfd 1 0 MB) Additional Information: full citation , abstract , references , citings , index 



terms 

An original method for code generation has been developed in conjunction with the 
construction of a compiler for the C programming language on the DEC-10 computer. The 
method is comprehensive, determining evaluation order and doing register allocation and 
instruction selection simultaneously. It uses exhaustive search rather than heuristics, and is 
table-driven, with most machine-specific information isolated in the tables. Testing and 
evaluation have shown that the method is effective, tha ... 

Cray Pascal . 

N. H. Madhavji, I. R. Wilson 

June 1982 ACM SIGPLAN Notices , Proceedings of the 1982 SIGPLAN symposium on 
Compiler construction, volume 17 issue 6 
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Full text available: ^pdf(731.22 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

This paper presents an investigation of the design decisions taken in the implementation of 
a compiler for Pascal on the CRAY-1 computer. The structured nature of Pascal statements 
and data structures is contrasted with the 'powerful computing engine" nature of the CRAY-1 
hardware. The accepted views of Pascal as a simple one-pass language and the CRAY-1 as a 
vector processor are laid aside in favour of a multi-pass approach, taking account of the 
machine's scalar capabilities. The project ... 

Keywords: CRAY-1, Code optimisation, Compilation, Pascal, Vector processors 



Porting the Zed compiler 

G. B. Bonkowski, W. M. Gentleman, M. A. Malcolm 

August 1979 ACM SIGPLAN Notices , Proceedings of the 1979 SIGPLAN symposium on 

Compiler construction, volume 14 issue 8 " 

Additional Information: full citation , abstract , references , citings , index 



Full text available: , . 

terms 

Zed is the base language used to implement the portable realtime operating system Thoth 
[7], and to write commands, utilities, application programs, and other software which run 
under Thoth. (Zed is similar to C, although language details are not important in this paper.) 
One of the founding principles of Thoth is our experience that the hardest problems in 
porting programs usually arise when interfacing to different operating systems. By porting 
the whole operating system first, we ensure t ... 

9 16-bit vs. 32-bit instructions for pipelined microprocessors 
John Bunda, Don Fussell, W. C. Athas, Roy Jenevein 

May 1993 ACM SIGARCH Computer Architecture News , Proceedings of the 20th 

annual international symposium on Computer architecture, volume 21 issue 2 

Full text available- « pdff8B3.21 KB) Additional Information: full citation , abstract, references , dtings, index 

terms 

In any stored-program computer system, information is constantly transferred between the 
memory and the instruction processor. Machine instructions are a major portion of this 
traffic. Since transfer bandwidth is a limited resource, inefficiency in the encoding of 
instruction information (low code density) can have definite hardware and performance 
costs. Starting with a parameterized baseline RISC design, we compare performance for two 
instruction encodings for the same instruct ... 

10 A retargetable register allocation framework for embedded processors 
Jean-Marc Daveau, Thomas Thery, Thierry Lepley, Miguel Santana 

June 2004 ACM SIGPLAN Notices , Proceedings of the 2004 ACM SIG PLAN /SIG BED 

conference on Languages, compilers, and tools, volume 39 issue 7 
Full text available: ^pdf(1.03 MB) Additional Information: full citation, abstract , references , index terms 

This paper describes the FlexCC2 register allocation framework. FlexCC2 is an optimizing 
retargetable C compiler for embedded processors, and in particular for DSP processors. 
Embedded processors often contain features such as irregular and constrained register sets 
that complicate register allocation, making traditional methods inefficient. In this paper, we 
present a register allocation framework specifically tailored for embedded processor 
specificities. This framework has been integrated in ... 

Keywords: embedded processors, register allocation 
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Engineering a production code generator 
John Crawford 

June 1982 ACM SIGPLAN Notices , Proceedings of the 1982 SIGPLAN symposium on 

Compiler construction, volume 17 issue 6 

i- .. * ^ *i ui 0 «i i^n\ Additional Information: full citation , abstract , references , citings , index 
Full text available: pdf(875.84 KB) ' * ' 



terms 

This paper describes the structure of a code generator formed by merging the best aspects 
of three code generation techniques: Graham-Glanville parser-driven code generation [G] 
[GG] [GR] [HG], the register allocation/spill mechanism from the Portable C Compiler [J], 
and a code template expander [W]. The Graham-Glanville method was modified to use a 
standard LALR parser and table builder, and the register allocation method was extended in 
several significant ways in order to make optimal us ... 

12 ASHs: Application-specific handlers for high-performance messaging 
Deborah A. Wallach, Dawson R. Engler, M. Frans Kaashoek 
August 1996 ACM SIG CO MM Computer Communication Review , Conference 

proceedings on Applications, technologies, architectures, and protocols 
for computer communications, volume 26 issue 4 

Additional Information: full citation , abstract , references , citings, index 



Full text available: ' ^ 

*^ terms 

Application-specific safe message handlers (ASHs) are designed to provide applications with 
hardware- level network performance. ASHs are user-written code fragments that safely and 
efficiently execute in the kernel in response to message arrival. ASHs can direct message 
transfers (thereby eliminating copies) and send messages (thereby reducing send-response 
latency). In addition, the ASH system provides support for dynamic integrated layer 
processing (thereby eliminating duplicate message ... 



13 Application specific processors: A low power architecture for embedded perception 
Binu Mathew, Al Davis, Mike Parker 

September 2004 Proceedings of the 2004 international conference on Compilers, 
architecture, and synthesis for embedded systems 

Full text available: ^ pdf(310.49 KB) Additional Information: full citation, abstract , references , index terms 

Recognizing speech, gestures, and visual features are important interface capabilities for 
future embedded mobile systems. Unfortunately, the real-time performance requirements of 
complex perception applications cannot be met by current embedded processors and often 
even exceed the performance of high performance microprocessors whose energy 
consumption far exceeds embedded energy budgets. Though custom ASICs provide a 
solution to this problem, they incur expensive and lengthy design cycles and ... 

Keywords: VLIW, computer vision, embedded systems, low power design, perception, 
speech recognition, stream processor 




14 A Fortran compiler for the FPS-164 scientific computer 
Roy F. Touzeau 

June 1984 ACM SIGPLAN Notices , Proceedings of the 1984 SIGPLAN symposium on 

Compiler construction, volume 19 issue 6 
Full text available: fi3pdf(863.50 KB) Additional Information: full citation , references , citings 



15 Dynamic analysis: The design and implementation of FIT: a flexible instrumentation 
toolkit 

Bruno De Bus, Dominique Chanet, Bjorn De Sutter, Ludo Van Put, Koen De Bosschere 
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June 2004 Proceedings of the ACM-SIGPLAN-SIGSOFT workshop on Program analysis 
for software tools and engineering 

Full text available: ^ pdf(252. 10 KB) - Additional Information: full citation , abstract : references , index terms * 

This paper presents FIT, a Flexible open-source binary code Instrumentation Toolkit. Unlike 
existing tools, FIT is truly portable, with existing backends for the Alpha, x86 and ARM 
architectures and the Tru64Unix, Linux and ARM Firmware execution environments. This 
paper focuses on some of the problems that needed to be addressed for providing this 
degree of portability. It also discusses the trade-off between instrumentation precision and 
low overhead. 

Keywords: code compaction, performance code abstraction 



16 Compiling Prolog into microcode: a case study using the NCR/32-000 
B. Fagin, Y. N. Patt, V. Srini, A. Despain 

December 1985 ACM SIGMICRO Newsletter / Proceedings of the 18th annual workshop 

on Microprogramming, volume 16 issue 4 
Full text available* Hi Ddfd 01 MB) Additional Information: full citation , abstract , references , citings , index 

^ terms 

A proven method of obtaining high performance for Prolog programs is to first translate 
them into the instruction set of Warren's Abstract Machine, or W-code [1]. From that point, 
there are several models of execution available. This paper describes one of them:- the 
compilation of W-code directly into the vertical microcode of a general purpose host 
processor, the NCR/32-000. The result is the fastest functioning Prolog system known to the 
authors. We describe the implementation, provide b ... 

17 ASHs: application-specific handlers for high-performance messaging 
Deborah A. Wallach, Dawson R. Engler, M. Frans Kaashoek 

August 1997 IEEE/ ACM Transactions on Networking (TON), volume 5 issue 4 

Full text available: tfl pdf( 174.62 KB) Additional Information: full citation, references , index terms 



Keywords: computer networks, dynamic code generation, modular computer systems, 
operating systems, protocols, software protection, user-level networking 



18 Packet types: abstract specification of network protocol messages | 
Peter 3. McCann, Satish Chandra 

August 2000 ACM SIGCOMM Computer Communication Review , Proceedings of the 

conference on Applications, Technologies, Architectures, and Protocols for 
Computer Communication, volume 30 issue 4 

Full text available: ^ pdf(435.48 KB) Additional Information: full citation , abstract , references , index terms 

In writing networking code, one is often faced with the task of interpreting a raw buffer 
according to a standardized packet format. This is needed, for example, when monitoring 
network traffic for specific kinds of packets, or when unmarshaling an incoming packet for 
protocol processing. In such cases, a programmer typically writes C code that understands 
the grammar of a packet and that also performs any necessary byte-order and alignment 
adjustments. Because of the complexity of certain ... 

19 Evaluation of scheduling techniques on a SPARC-based VLIWtestbed 
Seongbae Park, SangMin Shim, Soo-Mook Moon 

December 1997 Proceedings of the 30th annual ACM/IEEE international symposium on 
Microarchitecture 
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Full text available: ^ An „„ x (fjj| Additional Information: full citation , abstract , references , citings , index 
f |pdf(1.40 MB)* CT terms 

Publisher Site . . 

The performance of Very Long Instruction Word (VLIW) microprocessors depends on the 
close cooperation between the compiler and the architecture. This paper evaluates a set of 
important compilation techniques and related architectural features for VLIW machines. The 
evaluation is performed on a SPARC-based VLIW testbed where gcc-generated optimized 
SPARC code is scheduled into high-performance VLIW code. As a base scheduling compiler, 
we experiment with three core scheduling techniques including ... 

Keywords: SPARC-based VLIW testbed, VLIW microprocessors, Very Long Instruction Word 
microprocessors, all-path speculation, compiler, computer architecture, copies, gcc- 
generated optimized SPARC code, high-performance VLIW code, loop unrolling, memory 
disambiguation, nongreedy enhanced pipeline scheduling, nonspeculative operations, 
parallel machines, performance, profile-based all-path speculation, renaming, restricted 
speculative loads, scheduling compiler, scheduling techniques, software pipelining, 
speculative operations, trace-based speculation 



20 Direct execution models of processor behavior and performance Q 
Richard M. Fujimoto, William B. Campbell 

December 1987 Proceedings of the 19th conference on Winter simulation 

Full text available: ^ pdf(952.49 KB) Additional Information: full citation , abstract , references, index terms 

This paper discusses a modeling technique for creating efficient instruction level simulation 
models of von Neumann processors. In contrast to traditional approaches which use a 
software interpreter, this technique employs direct execution of application programs on the 
host computer. An assembly language program for the target machine is decompiled to a 
high level language, instrumented, and then recompiled and executed on the host computer. 
A prototype im ... 
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